HTTrack Website Copier
Open Source offline browser

HTTrack Programming page - plugging functions


You can write external functions to be plugged in the httrack library very easily. We'll see there some examples.

The httrack commandline tool allows (since the 3.30 release) to plug external functions to various callbacks defined in httrack.
See also: the httrack-library.h prototype file, and the callbacks-example.c given in the httrack archive.

Example: httrack --wrapper check-html=callback:process_file ..
With the callback.so (or callback.dll) module defined as below:
int process_file(char* html, int len, char* url_adresse, char* url_fichier) {
  printf("now parsing %s%s..\n", url_adresse, url_fichier);
  strcpy(currentURLBeingParsed, url_adresse);
  strcat(currentURLBeingParsed, url_fichier);
  return 1;  /* success */
}
Below the list of callbacks, and associated external wrappers:
typedef void (* t_hts_htmlcheck_filesave2)();
"callback name"callback descriptioncallback function signature
"init"Note: deprecated, should not be used anymore (unsafe callback) - see "start" callback or wrapper_init() module function below this table.Called during initialization ; use of htswrap_add (see httrack-library.h) is permitted inside this function to setup other callbacks.
return value: none
void (* myfunction)(void);
"free"Note: deprecated, should not be used anymore (unsafe callback) - see "end" callback or wrapper_exit() module function below this table.
Called during un-initialization
return value: none
void (* myfunction)(void);
"start"Called when the mirror starts. The opt structure passed lists all options defined for this mirror. You may modify the opt structure to fit your needs. Besides, use of htswrap_add (see httrack-library.h) is permitted inside this function to setup other callbacks.
return value: 1 upon success, 0 upon error (the mirror will then be aborted)
int (* myfunction)(httrackp* opt);
"end"Called when the mirror ends
return value: 1 upon success, 0 upon error (the mirror will then be considered aborted)
int (* myfunction)(void);
"change-options"Called when options are to be changed. The opt structure passed lists all options, updated to take account of recent changes
return value: 1 upon success, 0 upon error (the mirror will then be aborted)
int (* myfunction)(httrackp* opt);
"check-html"Called when a document (which may not be an html document) is to be parsed. The html address points to the document data, of lenth len. The url_adresse and url_fichier are the address and URI of the file being processed
return value: 1 if the parsing can be processed, 0 if the file must be skipped without being parsed
int (* myfunction)(char* html,int len,char* url_adresse,char* url_fichier);
"preprocess-html"Called when a document (which is an html document) is to be parsed (original, not yet modified document). The html address points to the document data address (char**), and the length address points to the lenth of this document. Both pointer values (address and size) can be modified to change the document. It is up to the callback function to reallocate the given pointer (using standard C library realloc()/free() functions), which will be free()'ed by the engine. Hence, return of static buffers is strictly forbidden, and the use of strdup() in such cases is advised. The url_adresse and url_fichier are the address and URI of the file being processed
return value: 1 if the new pointers can be applied (default value)
int (* myfunction)(char** html,int* len,char* url_adresse,char* url_fichier);
"postprocess-html"Called when a document (which is an html document) is parsed and transformed (links rewritten). The html address points to the document data address (char**), and the length address points to the lenth of this document. Both pointer values (address and size) can be modified to change the document. It is up to the callback function to reallocate the given pointer (using standard C library realloc()/free() functions), which will be free()'ed by the engine. Hence, return of static buffers is strictly forbidden, and the use of strdup() in such cases is advised. The url_adresse and url_fichier are the address and URI of the file being processed
return value: 1 if the new pointers can be applied (default value)
int (* myfunction)(char** html,int* len,char* url_adresse,char* url_fichier);
"query"Called when the wizard needs to ask a question. The question string contains the question for the (human) user
return value: the string answer ("" for default reply)
char* (* myfunction)(char* question);
"query2"Called when the wizard needs to ask a questionchar* (* myfunction)(char* question);
"query3"Called when the wizard needs to ask a questionchar* (* myfunction)(char* question);
"loop"Called periodically (informational, to display statistics)
return value: 1 if the mirror can continue, 0 if the mirror must be aborted
int (* myfunction)(lien_back* back,int back_max,int back_index,int lien_tot,int lien_ntot,int stat_time,hts_stat_struct* stats);
"check-link"Called when a link has to be tested. The adr and fil are the address and URI of the link being tested. The passed status value has the following meaning: 0 if the link is to be accepted by default, 1 if the link is to be refused by default, and -1 if no decision has yet been taken by the engine
return value: same meaning as the passed status value ; you may generally return -1 to let the engine take the decision by itself
int (* myfunction)(char* adr,char* fil,int status);
"check-mime"Called when a link download has begun, and needs to be tested against its MIME type. The adr and fil are the address and URI of the link being tested, and the mime string contains the link type being processed. The passed status value has the following meaning: 0 if the link is to be accepted by default, 1 if the link is to be refused by default, and -1 if no decision has yet been taken by the engine
return value: same meaning as the passed status value ; you may generally return -1 to let the engine take the decision by itself
int (* myfunction)(char* adr,char* fil,char* mime,int status);
"pause"Called when the engine must pause. When the lockfile passed is deleted, the function can return
return value: none
void (* myfunction)(char* lockfile);
"save-file"Called when a file is to be saved on disk
return value: none
void (* myfunction)(char* file);
"save-file2"Called when a file is to be saved or checked on disk
The hostname, filename and local filename are given. Two additional flags tells if the file is new (is_new) and is the file is to be modified (is_modified).
(!is_new && !is_modified): the file is up-to-date, and will not be modified
(is_new && is_modified): a new file will be written (or an updated file is being written)
(!is_new && is_modified): a file is being updated (append)
(is_new && !is_modified): an empty file will be written ("do not recatch locally erased files")
return value: none
void (* myfunction)(char* hostname,char* filename,char* localfile,int is_new,int is_modified);
"link-detected"Called when a link has been detected
return value: 1 if the link can be analyzed, 0 if the link must not even be considered
int (* myfunction)(char* link);
"transfer-status"Called when a file has been processed (downloaded, updated, or error)
return value: must return 1
int (* myfunction)(lien_back* back);
"save-name"Called when a local filename has to be processed. The adr_complete and fil_complete are the address and URI of the file being saved ; the referer_adr and referer_fil are the address and URI of the referer link. The save string contains the local filename being used. You may modifiy the save string to fit your needs, up to 1024 bytes (note: filename collisions, if any, will be handled by the engine by renaming the file into file-2.ext, file-3.ext ..).
return value: must return 1
int (* myfunction)(char* adr_complete,char* fil_complete,char* referer_adr,char* referer_fil,char* save);
"send-header"Called when HTTP headers are to be sent to the remote server. The buff buffer contains text headers, adr and fil the URL, and referer_adr and referer_fil the referer URL. The outgoing structure contains all information related to the current slot.
return value: 1 if the mirror can continue, 0 if the mirror must be aborted
int (* myfunction)(char* buff, char* adr, char* fil, char* referer_adr, char* referer_fil, htsblk* outgoing);
"receive-header"Called when HTTP headers are recevived from the remote server. The buff buffer contains text headers, adr and fil the URL, and referer_adr and referer_fil the referer URL. The incoming structure contains all information related to the current slot.
return value: 1 if the mirror can continue, 0 if the mirror must be aborted
int (* myfunction)(char* buff, char* adr, char* fil, char* referer_adr, char* referer_fil, htsblk* incoming);


Below additional function names that can be defined inside the module (DLL/.so):
"module function name"function description
int function-name_init(char *args);Called when a function named function-name is extracted from the current module (same as wrapper_init). The optional args provides additional commandline parameters. Returns 1 upon success, 0 if the function should not be extracted.
int wrapper_init(char *fname, char *args);Called when a function named fname is extracted from the current module. The optional args provides additional commandline parameters. Besides, use of htswrap_add (see httrack-library.h) is permitted inside this function to setup other callbacks. Returns 1 upon success, 0 if the function should not be extracted.
int wrapper_exit(void);Called when the module is unloaded. The function should return 1 (but the result is ignored).


Below additional function names that can be defined inside the optional libhttrack-plugin module (libhttrack-plugin.dll or libhttrack-plugin.so) searched inside common library path:
"module function name"function description
void plugin_init(void);Called if the module (named libhttrack-plugin.(so|dll)) is found in the library path. Use of htswrap_add (see httrack-library.h) is permitted inside this function to setup other callbacks.